KMID : 0644020200330020051
|
|
Journal Of Korean Medical Classics 2020 Volume.33 No. 2 p.51 ~ p.59
|
|
A Comparative Study of Feature Extraction Methods for Authorship Attribution in the Text of Traditional East Asian Medicine with a Focus on Function Words
|
|
Oh Jun-Ho
|
|
Abstract
|
|
|
Objectives : We would like to study what is the most appropriate "feature" to effectively perform authorship attribution of the text of Traditional East Asian Medicine
Methods : The authorship attribution performance of the Support Vector Machine (SVM) was compared by cross validation, depending on whether the function words or content words, single word or collocations, and IDF weights were applied or not, using ¡®Variorum of the Nanjing¡¯ as an experimental Corpus.
Results : When using the combination of 'function words/uni-bigram/TF', the performance was best with accuracy of 0.732, and the combination of 'content words/unigram/TFIDF' showed the lowest accuracy of 0.351.
Conclusions : This shows the following facts from the authorship attribution of the text of East Asian traditional medicine. First, function words play an important role in comparison to content words. Second, collocations was relatively important in content words, but single words have more important meanings in function words. Third, unlike general text analysis, IDF weighting resulted in worse performance.
|
|
KEYWORD
|
|
authorship attribution, Function words, Korean Medical Classics, East Asian traditional medicine, Variorum of the Nanjing
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|
|